Multiple imputation models should incorporate the outcome in the model of interest.

نویسندگان

  • Jonathan W Bartlett
  • Chris Frost
  • James R Carpenter
چکیده

Sir, In a recent publication in Brain, Jack Jr et al. (2010) reported on the value of hippocampal atrophy and amyloid-b measures in predicting conversion from mild cognitive impairment to Alzheimer’s disease. The authors used data from 218 subjects in the Alzheimer’s Disease Neuroimaging Initiative with mild cognitive impairment, who had a measure of amyloid-b either through CSF amyloid-b42 or Pittsburgh compound B positron emission tomography imaging (PIB-PET). Of the 218 subjects, only 53 (24%) had PIB-PET available, and so Jack Jr et al. (2010) used multiple imputation for measurement error correction (following Cole et al., 2006) to impute the missing PIB-PET values, based on each subject’s CSF amyloid-b42 and Apolipoprotein E (APOE) e4 status. The imputation model was fitted using data from a calibration data set of 41 subjects who had both PIB-PET and CSF amyloid-b42 data available. The fitted imputation model was then used to impute 100 ‘completed’ data sets, each with no missing PIB-PET values. In line with standard multiple imputation methodology, a Cox proportional hazards model was then fitted to each imputed data set, relating time to conversion to Alzheimer’s disease, to amyloid-b load (as measured by PIB-PET) and atrophy, and the results combined using Rubin’s rules for final inference. Our concern focuses on the imputation model used by Jack Jr et al. (2010) that may be mis-specified since it did not include variables representing conversion status and time to conversion or last follow-up (the outcome of interest). In general, omitting the outcome from the imputation model results in biased estimates (Moons et al., 2006; Sterne et al., 2009). Indeed, Cole et al. (2006) included the censoring indicator and logarithm of time to event as covariates in their imputation model (Appendix 2). Recently, it has been shown that a more accurate approach is to use an estimate of the baseline cumulative hazard function as covariate rather than log time (White and Royston, 2009). In the present context, the extent of the bias induced by omitting the outcome from the imputation model depends on the percentage of variation (R) in PIB-PET explained by CSF amyloid-b42 and APOE e4. Given the imputation model coefficients and residual standard deviation reported by Jack Jr et al. (2010), and the variance and correlation of CSF amyloid-b42 and APOE e4 in mild cognitive impairment subjects in the Alzheimer’s Disease Neuroimaging Initiative, we estimate that R was 80% and hence the extent of the bias was likely to be relatively small. Nevertheless, an analysis that included the outcome is equally feasible computationally, is likely to be less biased and is thus preferable. More generally, the effect of omitting the outcome variable from the imputation model is not always small (Sterne et al., 2009). A striking example of the dangers occurred in the development of the UK QRISK cardiovascular risk score (Hippisley-Cox et al., 2007b), in which missing cholesterol values were imputed using multiple imputation. Surprisingly, serum cholesterol ratio was found to have no independent effect on risk of cardiovascular disease. The authors subsequently clarified that the censoring indicator had been inadvertently omitted from the imputation model, and a re-analysis using an improved imputation model did result in an independent effect of cholesterol (Hippisley-Cox et al., 2007a). In summary, we emphasize that multiple imputation is a powerful statistical tool for the analysis of partially observed data that can alleviate biases and recover information. However, the validity doi:10.1093/brain/awr061 Brain 2011: 134; 1–2 | e189 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated proliferation correction factors in linear-quadratic and multiple-component models

Background: Study in design to incorporate accelerated proliferation correction factors into linearquadratic and multiple-component models. Materials and Methods: Accelerated proliferation rate correction factor has been incorporated into the linearquadratic and the multiple component models by applying accelerated exponential cell growth to explain the tumor cell kinetics and estimates proper ...

متن کامل

چند رویکرد برخورد با مقادیر گمشده‌ متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی‌ بالینی

Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...

متن کامل

Outcome-sensitive multiple imputation: a simulation study

BACKGROUND Multiple imputation is frequently used to deal with missing data in healthcare research. Although it is known that the outcome should be included in the imputation model when imputing missing covariate values, it is not known whether it should be imputed. Similarly no clear recommendations exist on: the utility of incorporating a secondary outcome, if available, in the imputation mod...

متن کامل

Selection of Variables that Influence Drug Injection in Prison: Comparison of Methods with Multiple Imputed Data Sets

Background: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of HIV transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process ...

متن کامل

Accuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)

Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...

متن کامل

An Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods

Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Brain : a journal of neurology

دوره 134 Pt 11  شماره 

صفحات  -

تاریخ انتشار 2011